Foundry Local Samples

This repository contains sample code demonstrating how to load and initialize local models using Microsoft’s Foundry Local service, both as a standalone application and integrated with .NET Aspire.

Samples Included

1. Basic Foundry Local Sample (`Qwen25-FoundryLocal-Sample`)

A simple console application that demonstrates:

Direct initialization of Qwen2.5 model using Foundry Local
Basic chat completion
Streaming chat responses
Proper resource management and error handling

2. .NET Aspire Integration Sample (`Qwen25-Aspire-Sample`)

A complete web application demonstrating:

.NET Aspire orchestration of Foundry Local services
Web API endpoints for chat interactions
Modern web interface for testing
Proper separation of concerns between App Host and client application

Prerequisites

.NET 8.0 SDK or later
Visual Studio 2022 17.8+ or Visual Studio Code with C# extension
Windows 11 (recommended for optimal hardware detection)
Foundry Local installed - See installation guide
Sufficient system memory (4GB+ available RAM recommended for Qwen2.5-0.5B)

Important Note About Package Availability

⚠️ The .NET Aspire integration shown in your session transcript uses packages that are currently in private preview.

What Works Now (✅)

Basic Foundry Local Sample - Uses publicly available packages
Direct integration with Foundry Local SDK
OpenAI-compatible API access

What’s Coming Soon (🔄)

Full .NET Aspire Integration - Packages like Microsoft.Extensions.Hosting.FoundryLocal are in development
Seamless orchestration as shown in the Build session

Current Status ✅

All code compiles and runs successfully! The samples can:

Start and connect to Foundry Local services
Discover available models in the catalog (50+ models found)
Select appropriate models (Qwen, Phi, Mistral, etc.)
Attempt to load models for inference

⚠️ Important: Models must be downloaded before they can be loaded. The first time you try to load a model, you’ll get a “Model not found” error - this is expected. See the “Downloading Models” section below for instructions.

Hardware Requirements

The Qwen2.5 models have different hardware requirements:

Qwen2.5-0.5B: ~1GB RAM, works on most modern devices
Qwen2.5-1.5B: ~3GB RAM, better quality responses
Qwen2.5-3B: ~6GB RAM, highest quality responses

Foundry Local will automatically select the best model variant for your hardware.

Getting Started

Option 1: Basic Foundry Local Sample (✅ Recommended - Works Now)

Navigate to the basic sample directory:

cd "c:\dev\Samples\20250629 Build 2025 session\Qwen25-FoundryLocal-Sample"

Restore packages:
```
dotnet restore
```
Run the discovery application (recommended - shows available models):
```
dotnet run
```
Or run the simple version (faster startup):
```
dotnet run simple
```
Note: The first run will download the model (~800MB for smaller models), which may take several minutes depending on your internet connection.

Option 2: .NET Aspire Sample (⚠️ Not Currently Available)

Navigate to the Aspire sample directory:

cd "c:\dev\Samples\20250629 Build 2025 session\Qwen25-Aspire-Sample"

Restore packages:
```
dotnet restore
```
Run the App Host (this will start both the Foundry Local service and the web application):
```
dotnet run --project Qwen25.AppHost
```
Open your browser and navigate to the web application URL shown in the console (typically https://localhost:7000 or similar)
Use the .NET Aspire dashboard to monitor the services (URL will be shown in console)

Key Features Demonstrated

Foundry Local Features

Automatic Hardware Detection: Foundry Local automatically detects your GPU, CPU, and NPU capabilities
Model Optimization: Automatically selects the best quantization and optimization for your hardware
OpenAI Compatibility: Uses familiar OpenAI-compatible APIs
Local Execution: Everything runs locally - no data sent to external services

.NET Aspire Features

Service Orchestration: Manages the lifecycle of Foundry Local services
Dependency Management: Ensures model download completes before starting the web application
Telemetry Integration: Rich logging and monitoring through OpenTelemetry
Health Checks: Built-in health monitoring for all services

API Endpoints (Aspire Sample)

POST /api/chat - Send a message and get a complete response
POST /api/chat/stream - Send a message and get a streaming response

Example API Usage

POST /api/chat
{
  "message": "What are the benefits of running AI models locally?"
}

Configuration Options

Model Selection

You can change the model by modifying the model name in the configuration:

// For basic sample (Program.cs)
var modelName = "Qwen2.5-1.5B"; // Larger model

// For Aspire sample (AppHost/Program.cs)
var foundryResource = builder.AddFoundryLocalResource("ai")
    .AddModel("chat", "Qwen2.5-1.5B");

Available Models

Qwen2.5-0.5B - Fastest, smallest (500M parameters)
Qwen2.5-1.5B - Balanced performance (1.5B parameters)
Qwen2.5-3B - Highest quality (3B parameters)

Troubleshooting

Package/Build Issues

Package downgrade warnings: If you see System.ClientModel version conflicts, remove the explicit reference - the OpenAI SDK will bring in the correct version automatically
NuGet restore errors: Clear NuGet cache with dotnet nuget locals all --clear and delete bin/obj folders
Missing packages: Ensure you have the latest .NET 8 SDK installed

Model Download Issues

“Model not found in local models” Error: This is expected behavior! Models need to be downloaded before they can be loaded.

To download models, you have several options:

Via Foundry CLI: Use foundry model run <model-alias> (e.g., foundry model run phi-3-mini-4k) - downloads automatically
Via Azure AI Studio: Browse to the Foundry Local models section and download models
Interactive testing: The foundry model run command starts an interactive chat after download

Common download issues:

Ensure stable internet connection for initial download
Check available disk space (models can be 800MB - 6GB)
Verify firewall/antivirus isn’t blocking the download

Performance Issues

Close other memory-intensive applications
Consider using a smaller model (0.5B) if experiencing slowdowns
Monitor CPU/GPU usage in Task Manager

Foundry Local Service Issues

“Model not found in catalog” errors: Run DiscoverAndRunModels.cs to see available models, or check if your Foundry Local installation is up to date
Empty model catalog: Ensure Foundry Local is properly installed and can access the internet to download the model catalog
Service startup failures: Ensure no other instances are running
Port conflicts: Check that ports 5272 (default) are not in use by other applications
Permission errors: Run with appropriate permissions, especially on first install
Check Windows Event Viewer for detailed error messages
Verify .NET 8 runtime is properly installed

Architecture Notes

Basic Sample Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Console App     │───▶│ Foundry Manager  │───▶│ Local AI Model  │
│ (Your Code)     │    │ (Service Layer)  │    │ (Qwen2.5)      │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Aspire Sample Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ .NET Aspire     │───▶│ Foundry Local    │───▶│ Local AI Model  │
│ App Host        │    │ Service          │    │ (Qwen2.5)      │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                                              ▲
         ▼                                              │
┌─────────────────┐    ┌──────────────────┐            │
│ Web Application │───▶│ HTTP API Client  │────────────┘
│ (Razor Pages)   │    │ (OpenAI Compat)  │
└─────────────────┘    └──────────────────┘

Performance Expectations

Model	Size	RAM Usage	CPU (approx)	GPU (with support)
Qwen2.5-0.5B	~800MB	~1GB	2-5 tokens/sec	10-50 tokens/sec
Qwen2.5-1.5B	~2GB	~3GB	1-3 tokens/sec	5-30 tokens/sec
Qwen2.5-3B	~6GB	~6GB	0.5-2 tokens/sec	3-20 tokens/sec

Performance varies significantly based on hardware configuration

Security Notes

All processing happens locally - no data leaves your device
Models are cached locally after first download
No API keys or external authentication required
Consider firewall rules if running in production environments

Next Steps

Explore function calling capabilities
Integrate with existing .NET applications
Experiment with different model sizes
Add custom system prompts and fine-tuning
Scale to multiple models using Aspire orchestration

Resources

License

This sample code is provided under the MIT License. See LICENSE file for details.

Downloading Models

Before you can use any model, it must be downloaded to your local machine. The discovery sample shows all available models in the catalog, but they need to be downloaded before loading.

Quick Start - Download a Model

Based on the available models, here are the recommended downloads:

For beginners (smallest, fastest):

# Download and run Phi-3 Mini (2.2GB) - good balance of size and capability
foundry model run phi-3-mini-4k

# Alternative: Download and run Qwen 2.5 0.5B (smaller, very fast)
foundry model run qwen2.5-0.5b

For better quality responses:

# Download and run Phi-4 (8.6GB) - latest and most capable
foundry model run phi-4

# Download and run Mistral 7B (4GB) - good general purpose model
foundry model run mistral-7b-v0.2

How It Works

The Foundry CLI simplifies the process by combining download and execution:

foundry model list - Shows all available models in the catalog
foundry model run <model-name> - Downloads the model (if needed) and starts interactive chat
Your .NET samples automatically work - Once a model is downloaded, your code can load it

Key Benefits:

✅ One command does everything - no separate download step
✅ Automatic hardware optimization - selects best GPU/CPU variant
✅ Interactive testing - chat with the model before using in code
✅ Background service - models remain loaded for your .NET applications

Available Download Methods

Foundry CLI (Recommended)

# Download and run models (downloads automatically if not present)
foundry model run phi-3-mini-4k
foundry model run qwen2.5-0.5b
foundry model run phi-4

# List available models in catalog
foundry model list

# Check service status
foundry service status

Azure AI Studio
- Open Azure AI Studio
- Navigate to Foundry Local section
- Browse available models and click download
Direct Download and Run (Easiest)
- The foundry model run command automatically downloads models if not present
- No separate download step needed - just run the model you want to use

Model Selection Guide

Based on your output, here are the available models by category:

Phi Models (Microsoft):

phi-4 - Latest, best quality (8.6GB)
phi-3-mini-128k - Long context support (2.2GB)
phi-3-mini-4k - Standard context (2.2GB)

Qwen Models (Alibaba):

qwen2.5-0.5b - Smallest, fastest (500MB)
Other Qwen variants available in catalog

Mistral Models:

mistral-7b-v0.2 - Good general purpose (4GB)

Hardware-Specific Downloads

Your catalog shows different variants for different hardware:

*-cuda-gpu - NVIDIA GPU acceleration
*-generic-gpu - General GPU support
*-generic-cpu - CPU-only execution

Foundry Local automatically selects the best variant for your hardware.

Download Troubleshooting

Common Issues:

Slow downloads: Models are large (500MB-10GB), ensure stable internet
Disk space: Check you have enough free space before downloading
Network issues: Corporate firewalls may block downloads
Permission errors: Run with administrator privileges if needed

Verify Download:

# List all available models and their status
foundry model list

# Check if service is running
foundry service status

What Happens After Download

Once downloaded, your sample code will work without the “Model not found” error:

Discovery Sample: Will show downloaded models and successfully load them
Simple Sample: Will automatically find and use downloaded models
Chat Completion: Will work with full streaming and regular responses

Next Steps After Download

Navigate to the sample directory: cd "c:\dev\Samples\20250629 Build 2025 session\Qwen25-FoundryLocal-Sample"
Run the discovery sample: dotnet run discovery
Try the simple sample: dotnet run simple
The model should load successfully and respond to chat prompts!